摘要 :
A review about technical and perceptual factors in hearing aid technology, research and development is provided, covering current commercial solutions, underlying models of hearing loss for usage in hearing devices and emerging fu...
展开
A review about technical and perceptual factors in hearing aid technology, research and development is provided, covering current commercial solutions, underlying models of hearing loss for usage in hearing devices and emerging future technical solutions for hearing aid functionalities. A chain of techniques has provided incremental, but steady increases in user benefit, e.g. in the fields of hearing aid amplification, feedback suppression, dynamic compression, noise reduction and situation adaptation. The models describing the perceptual consequences of sensorineural hearing impairment describe the effects on the acoustical level, the neurosensory level and the cognitive level and provide the framework for compensatory (or even substitutional) functions of hearing aids in terms of the attenuation component, the distortion component and the neural component of the hearing loss. A major factor is the requirement of a strong individualisation of hearing aid solutions calling for an appropriate assessment of the different sensorineural components of a hearing loss, especially with respect to bilateral and binaural hearing aid solutions.
收起
摘要 :
Brief deviations of interaural correlation (IAC) can provide valuable cues for detection, segregation and localization of acoustic signals. This study investigated the processing of such "binaural gaps" in continuously running noi...
展开
Brief deviations of interaural correlation (IAC) can provide valuable cues for detection, segregation and localization of acoustic signals. This study investigated the processing of such "binaural gaps" in continuously running noise (100-2000 Hz), in comparison to silent "monaural gaps", by measuring late auditory evoked potentials (LAEPs) and perceptual thresholds with novel, iteratively optimized stimuli. Mean perceptual binaural gap duration thresholds exhibited a major asymmetry: they were substantially shorter for uncorrelated gaps in correlated and anticorrelated reference noise (1.75 ms and 4.1 ms) than for correlated and anticorrelated gaps in uncorrelated reference noise (26.5 ms and 39.0 ms). The thresholds also showed a minor asymmetry: they were shorter in the positive than in the negative IAC range. The mean behavioral threshold for monaural gaps was 5.5 ms. For all five gap types, the amplitude of LAEP components N1 and P2 increased linearly with the logarithm of gap duration. While perceptual and electrophysiological thresholds matched for monaural gaps, LAEP thresholds were about twice as long as perceptual thresholds for uncorrelated gaps, but half as long for correlated and anticorrelated gaps. Nevertheless, LAEP thresholds showed the same asymmetries as perceptual thresholds. For gap durations below 30 ms, LAEPs were dominated by the processing of the leading edge of a gap. For longer gap durations, in contrast, both the leading and the lagging edge of a gap contributed to the evoked response. Formulae for the equivalent rectangular duration (ERD) of the binaural system's temporal window were derived for three common window shapes. The psychophysical ERD was 68 ms for diotic and about 40 ms for anti- and uncorrelated noise. After a nonlinear Z-transform of the stimulus IAC prior to temporal integration, ERDs were about 10 ms for reference correlations of 1 and 80 ms for uncorrelated reference. Hence, a physiologically motivated peripheral nonlinearity changed the rank order of ERDs across experimental conditions in a plausible manner. (C) 2015 Elsevier B.V. All rights reserved.
收起
摘要 :
The application of machine learning for the development of clinical decision-support systems in audiology provides the potential to improve the objectivity and precision of clinical experts’ diagnostic decisions. However, for suc...
展开
The application of machine learning for the development of clinical decision-support systems in audiology provides the potential to improve the objectivity and precision of clinical experts’ diagnostic decisions. However, for successful clinical application, such a tool needs to be accurate, as well as accepted and trusted by physicians. In the field of audiology, large amounts of patients’ data are being measured, but these are distributed over local clinical databases and are heterogeneous with respect to the applied assessment tools. For the purpose of integrating across different databases, the Common Audiological Functional Parameters (CAFPAs) were recently established as abstract representations of the contained audiological information describing relevant functional aspects of the human auditory system. As an intermediate layer in a clinical decision-support system for audiology, the CAFPAs aim at maintaining interpretability to the potential users. Thus far, the CAFPAs were derived by experts from audiological measures. For designing a clinical decision-support system, in a next step the CAFPAs need to be automatically derived from available data of individual patients. Therefore, the present study aims at predicting the expert generated CAFPA labels using three different machine learning models, namely the lasso regression, elastic nets, and random forests. Furthermore, the importance of different audiological measures for the prediction of specific CAFPAs is examined and interpreted. The established models are then used to predict CAFPAs for unlabeled data not seen by experts. Prediction of unlabeled cases is evaluated by means of model-based clustering methods. Results indicate an adequate prediction of the ten distinct CAFPAs. All models perform comparably and turn out to be suitable choices for the prediction of CAFPAs. They also generalize well to unlabeled data. Additionally, the extracted relevant variables are plausible for the respective CAFPAs, facilitating interpretability of the predictions. Based on the established models, a prototype of a clinical decision-support system in audiology can be implemented and extended towards clinical databases in the future.
收起
摘要 :
Speech reception thresholds (SRTs) decrease as target and maskers are spatially separated (spatial release from masking, SRM). The current study systematically assessed how SRTs and SRM for a frontal target in a spatially symmetri...
展开
Speech reception thresholds (SRTs) decrease as target and maskers are spatially separated (spatial release from masking, SRM). The current study systematically assessed how SRTs and SRM for a frontal target in a spatially symmetric masker configuration depend on spectro-temporal masker properties, the availability of short-time interaural level difference (ILD) and interaural time difference (ITD), and informational masking. Maskers ranged from stationary noise to single, interfering talkers and were modified by head-related transfer functions to provide: (i) different binaural cues (ILD, ITD, or both) and (ii) independent maskers in each ear ("infinite ILD"). Additionally, a condition was tested in which only information from short-time spectro-temporal segments of the ear with a favorable signal-to-noise ratio (better-ear glimpses) was presented. For noise-based maskers, ILD, ITD, and spectral changes related to masker location contributed similarly to SRM, while ILD cues played a larger role if temporal modulation was introduced. For speech maskers, glimpsing and perceived location contributed roughly equally and ITD contributed less. The "infinite ILD" condition might suggest better-ear glimpsing limitations resulting in a maximal SRM of 12 dB for maskers with low or absent informational masking. Comparison to binaural model predictions highlighted the importance of short-time processing and helped to clarify the contribution of the different binaural cues and mechanisms. (C) 2017 Acoustical Society of America.
收起
摘要 :
Speech reception thresholds (SRTs) decrease as target and maskers are spatially separated (spatial release from masking, SRM). The current study systematically assessed how SRTs and SRM for a frontal target in a spatially symmetri...
展开
Speech reception thresholds (SRTs) decrease as target and maskers are spatially separated (spatial release from masking, SRM). The current study systematically assessed how SRTs and SRM for a frontal target in a spatially symmetric masker configuration depend on spectro-temporal masker properties, the availability of short-time interaural level difference (ILD) and interaural time difference (ITD), and informational masking. Maskers ranged from stationary noise to single, interfering talkers and were modified by head-related transfer functions to provide: (i) different binaural cues (ILD, ITD, or both) and (ii) independent maskers in each ear ("infinite ILD"). Additionally, a condition was tested in which only information from short-time spectro-temporal segments of the ear with a favorable signal-to-noise ratio (better-ear glimpses) was presented. For noise-based maskers, ILD, ITD, and spectral changes related to masker location contributed similarly to SRM, while ILD cues played a larger role if temporal modulation was introduced. For speech maskers, glimpsing and perceived location contributed roughly equally and ITD contributed less. The "infinite ILD" condition might suggest better-ear glimpsing limitations resulting in a maximal SRM of 12 dB for maskers with low or absent informational masking. Comparison to binaural model predictions highlighted the importance of short-time processing and helped to clarify the contribution of the different binaural cues and mechanisms. (C) 2017 Acoustical Society of America.
收起
摘要 :
Speech intelligibility is strongly affected by the presence of maskers. Depending on the spectro-temporal structure of the masker and its similarity to the target speech, different masking aspects can occur which are typically ref...
展开
Speech intelligibility is strongly affected by the presence of maskers. Depending on the spectro-temporal structure of the masker and its similarity to the target speech, different masking aspects can occur which are typically referred to as energetic, amplitude modulation, and informational masking. In this study speech intelligibility and speech detection was measured in maskers that vary systematically in the time-frequency domain from steady-state noise to a single interfering talker. Male and female target speech was used in combination with maskers based on speech for the same or different gender. Observed data were compared to predictions of the speech intelligibility index, extended speech intelligibility index, multi-resolution speech-based envelope-power-spectrum model, and the short-time objective intelligibility measure. The different models served as analysis tool to help distinguish between the different masking aspects. Comparison shows that overall masking can to a large extent be explained by short-term energetic masking. However, the other masking aspects (amplitude modulation an informational masking) influence speech intelligibility as well. Additionally, it was obvious that all models showed considerable deviations from the data. Therefore, the current study provides a benchmark for further evaluation of speech prediction models. (C) 2016 Acoustical Society of America.
收起
摘要 :
The paper describes a system for automatic speech recognition (ASR) that is benchmarked with data of the 3rd CHiME challenge, a dataset comprising distant microphone recordings of noisy acoustic scenes in public environments. The ...
展开
The paper describes a system for automatic speech recognition (ASR) that is benchmarked with data of the 3rd CHiME challenge, a dataset comprising distant microphone recordings of noisy acoustic scenes in public environments. The proposed ASR system employs various methods to increase recognition accuracy and noise robustness. Two different multi-channel speech enhancement techniques are used to eliminate interfering sounds in the audio stream. One speech enhancement method aims at separating the target speaker's voice from background sources based on non-negative matrix factorization (NMF) using variational Bayesian (VB) inference to estimate NMF parameters. The second technique is based on a time-varying minimum variance distortionless response (MVDR) beamformer that uses spatial information to suppress sound signals not arriving from a desired direction. Prior to speech enhancement, a microphone channel failure detector is applied that is based on cross-comparing channels using a modulation-spectral representation of the speech signal. ASR feature extraction employs the amplitude modulation filter bank (AMFB) that implicates prior information of speech to analyze its temporal dynamics. AMFBs outperform the commonly used frame splicing technique of filter bank features in conjunction with a deep neural network (DNN) based ASR system, which denotes an equivalent data-driven approach to extract modulation-spectral information. In addition, features are speaker adapted, a recurrent neural network (RNN) is employed for language modeling, and hypotheses of different ASR systems are combined to further enhance the recognition accuracy. The proposed ASR system achieves an absolute word error rate (WER) of 5.67% on the real evaluation test data, which is 0.16% lower compared to the best score reported within the 3rd CHiME challenge.
收起
摘要 :
Developing and selecting hearing aids is a time consuming process which is simplified by using objective models. Previously, the framework for auditory discrimination experiments (FADE) accurately simulated benefits of hearing aid...
展开
Developing and selecting hearing aids is a time consuming process which is simplified by using objective models. Previously, the framework for auditory discrimination experiments (FADE) accurately simulated benefits of hearing aid algorithms with root mean squared prediction errors below 3 dB. One FADE simulation requires several hours of (un)processed signals, which is obstructive when the signals have to be recorded. We propose and evaluate a da ta-r educed F ADE version (DARF) which facilitates simulations with signals that cannot be processed digitally, but that can only be recorded in real-time. DARF simulates one speech recognition threshold (SRT) with about 30 min of recorded and processed signals of the (German) matrix sentence test. Benchmark experiments were carried out to compare DARF and standard FADE exhibiting small differences for stationary maskers (1 dB), but larger differences with strongly fluctuating maskers (5 dB). Hearing impairment and hearing aid algorithms seemed to reduce the differences. Hearing aid benefits were simulated in terms of speech recognition with three pairs of real hearing aids in silence ( >8 dB), in stationary and fluctuating maskers in co-located (stat. 2 dB; fluct. 6 dB), and spatially separated speech and noise signals (stat. >8 dB; fluct. 8 dB). The simulations were plausible in comparison to data from literature, but a comparison with empirical data is still open. DARF facilitates objective SRT simulations with real devices with unknown signal processing in real environments. Yet, a validation of DARF for devices with unknown signal processing is still pending since it was only tested with three similar devices. Nonetheless, DARF could be used for improving as well as for developing or model-based fitting of hearing aids.
收起
摘要 :
Objectives: Normalizing perceived loudness is an important rationale for gain adjustments in hearing aids. It has been demonstrated that gains required for restoring normal loudness perception for monaural narrowband signals can l...
展开
Objectives: Normalizing perceived loudness is an important rationale for gain adjustments in hearing aids. It has been demonstrated that gains required for restoring normal loudness perception for monaural narrowband signals can lead to higher-than-normal loudness in listeners with hearing loss, particularly for binaural broadband presentation. The present study presents a binaural bandwidth-adaptive dynamic compressor (BBDC) that can apply different gains for narrow- and broadband signals. It was hypothesized that normal perceived loudness for a broad variety of signals could be restored for listeners with mild to moderate high-frequency hearing loss by applying individual signal-dependent gain corrections.
收起
摘要 :
Normal-hearing (NH) listeners are able to localize sound sources with extraordinary accuracy through interaural cues, most importantly interaural time differences (ITDs) in the temporal fine structure. Bilateral cochlear implant (...
展开
Normal-hearing (NH) listeners are able to localize sound sources with extraordinary accuracy through interaural cues, most importantly interaural time differences (ITDs) in the temporal fine structure. Bilateral cochlear implant (CI) users are also able to localize sound sources, yet generally at lower accuracy than NH listeners. The gap in performance can in part be attributed to current CI systems not faithfully transmitting interaural cues, especially ITDs. With the introduction of binaurally linked CI systems, the presentation of ITD cues for bilateral CI users is foreseeable. The current study therefore investigated extent-of-lateralization percepts elicited in bilateral CI listeners when presented with single-electrode pulse-trains carrying controlled ITD cues. The results were compared against NH listeners listening to broadband stimuli as well as simulations of CI listening. Broadband stimuli in NH listeners were perceived as fully lateralized within the natural ITD range. Using simulated as well as real CI stimuli, however, only a fraction of the full extent of lateralization range was covered by natural ITDs. The maximum extent of lateralization was reached at ITDs as large as twice the natural limit. The results suggest that ITD-enhancement might be a viable option for improving localization abilities with future binaural CI systems. (C) 2017 Acoustical Society of America.
收起